
Table 3.1: Performance of Matching model with different weights in the Recsys2016
dataset
(w1, w2) Map@1 Map@5 Map@10 Map@30 Map@150 RSScore
(0.0, 1.0) 0.0018 0.0018 0.0020 0.0024 0.0028 0.7715
(0.1, 0.9) 0.0018 0.0018 0.0020 0.0024 0.0028 0.7671
(0.2, 0.8) 0.0018 0.0018 0.0020 0.0024 0.0028 0.7677
(0.3, 0.7) 0.0017 0.0017 0.0019 0.0023 0.0026 0.7457
(0.4, 0.6) 0.0015 0.0014 0.0016 0.0019 0.0023 0.6551
(0.5, 0.5) 0.0011 0.0010 0.0012 0.0014 0.0016 0.4970
(0.6, 0.4) 0.0005 0.0005 0.0006 0.0007 0.0008 0.2715
(0.7, 0.3) 0.0002 0.0002 0.0002 0.0003 0.0003 0.1032
(0.8, 0.2) 0.0001 0.0001 0.0001 0.0001 0.0001 0.0376
(0.9, 0.1) 0.0001 0.0000 0.0000 0.0001 0.0001 0.0328
(1.0, 0.0) 0.0000 0.0000 0.0000 0.0000 0.0000 0.0067
which are called impressions since the users can only interact with a finite number of
impressions. If a job post never shows in the impressions of the system, it is not possible
to collect any interaction between the user and the post. In RecSys2016, impressions
are generated by the existing recommendation system. In CareerBuilder2012, there
is no information about how impressions shown to users are made. For that reason,
the existing recommendation algorithms or baselines can drive the performance of the
recommendation models in the thesis.
The user-item matching approach’s performance with different weights using the
RecSys2016 and CareerBuilder2012 dataset are respectively shown in Table 3.1 and 3.2.
Surprisingly, in RecSys2016, even utilizing some seem-to-be informative non-textual
data such as: ”discipline
id”, ”industry id”, and ”career level” beside geographical data
like ”region” and ”country”, the model that was constructed based solely on textual
data has the best performance in all evaluation metrics. This might be due to the big
gaps between users and items in ”discipline
id”, ”industry id”, and ”career level” men-
tioned in the RecSys2016 dataset analysis. In CareerBuilder2012, (w1,w2) = (0.5, 0.5)
gives the highest performance, which means both non-textual and textual data are im-
portant for the recommendation although the non-textual data in CareerBuilder2012
only contains geographical data, including ”City”, ”State”, ”Country”, and not other
seem-to-be informative data as RecSys2016’s. The reason for the impact of geographi-
cal data on the performance in this dataset might be that the existing algorithms used
in CareerBuilder that generate the impressions also utilized this data type, especially
when the competition suggested using geographical data for its baseline approach.
Table 3.3 and 3.4 indicate the content-based recommendation system’s evaluation
metrics with different weights. The model constructed using the RecSys2016 dataset
improves significantly compared to the user-item matching approach. The optimal (w1,
w2) pair is (0.1, 0.9), which is not so different from the one in the matching approach
and still utilizes mostly textual data. This reveals that there might be a semantic
63